{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Feature Engineering\n",
    "\n",
    "Create features that help the machine learning algorithms find patterns in the data, and therefore perform the classification task better than when the algorithms are trained on raw data. \n",
    "   \n",
    "Feature engineering transforms the raw data into a set of designed features. Cyclic-moment based features are widely used for the purpose of modulation recognition. A total of 32 features are designed in this notebook. The first 16 features are based on 0 cyclic time lag and the next 16 features are based on 8 cyclic time lag. The features are of the form \n",
    "\n",
    "\\begin{equation}\n",
    "s_{mn} = f_m\\big(\\left\\{x_i^nx_{i+T}^n\\right\\}\\big).\n",
    "\\end{equation}\n",
    "which is the the $m$th order statistic on the $n$th power of the instantaneous or time delayed received signal $x_i$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Import the required modules\n",
    "\n",
    "import numpy as np\n",
    "import pickle  \n",
    "import sklearn\n",
    "from sklearn import metrics\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "import os,random\n",
    "import sys, math, cmath\n",
    "from time import time\n",
    "from IPython.display import Markdown, display\n",
    "from collections import defaultdict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Original dataset**\n",
      "Dataset has modulation types ['8PSK', 'AM-DSB', 'AM-SSB', 'BPSK', 'CPFSK', 'GFSK', 'PAM4', 'QAM16', 'QAM64', 'QPSK', 'WBFM']\n",
      "and SNR values [-20, -18, -16, -14, -12, -10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]\n",
      "   \n",
      "**New dataset**\n",
      "Limit dataset to 8 digital modulation schemes: ['8PSK', 'BPSK', 'CPFSK', 'GFSK', 'PAM4', 'QAM16', 'QAM64', 'QPSK']\n"
     ]
    }
   ],
   "source": [
    "#Load the dataset\n",
    "\n",
    "with open(\"RML2016.10a_dict.dat\",'rb') as f:\n",
    "    Xd = pickle.load(f, encoding='latin1')\n",
    "#we tell pickle.load() how to convert Python bytestring data to Python 3 strings, \n",
    "#or you can tell pickle to leave them as bytes.Setting the encoding to latin1 allows you to import the data directly\n",
    "\n",
    "snrs,mods = map(lambda j: sorted(list(set(map(lambda x: x[j], Xd.keys())))), [1,0])\n",
    "# in map(function, input) format, input = Xd.keys(); we feed to variable x, one key at a time\n",
    "# then we apply another map; from above, we got a list = [(mod type, SNR)]\n",
    "# so to each pair; when we do (mod type, SNR)[0] we get the mod type; (mod type, SNR)[1] gives the SNR value\n",
    "print('**Original dataset**')\n",
    "print(\"Dataset has modulation types\", mods)\n",
    "print(\"and SNR values\", snrs)\n",
    "\n",
    "#delete analogue modulation and keep only digital modulation schemes\n",
    "\n",
    "mods_to_rmv = ['AM-DSB', 'AM-SSB', 'WBFM']  #mod names we need to remove\n",
    "keys_to_rmv = [key for key in Xd.keys() if key[0] in mods_to_rmv]  #keys corresponding to all these mods\n",
    "# we remove each key containing each of the analog mod types\n",
    "Xd_digital = {key: Xd[key] for key in Xd if key not in keys_to_rmv}  #new dictionary containing only digital mod types\n",
    "snrs,mods = map(lambda j: sorted(list(set(map(lambda x: x[j], Xd_digital.keys())))), [1,0])\n",
    "print(\"   \")\n",
    "print('**New dataset**')\n",
    "print(\"Limit dataset to\",len(list(mods)), \"digital modulation schemes:\", mods)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Samples in dataset**\n",
      "Dictionary contains 160 keys, meaning 160 (mod, SNR) pairs\n",
      "Each key or (mod,SNR) pair is assigned {(1000, 2, 128)} array\n",
      "This results in a total of 1000 samples per (mod, SNR) pair, with each sample a (2, 128) array\n"
     ]
    }
   ],
   "source": [
    "#Check samples assigned to keys - each (mod, SNR) pair in dictionary \n",
    "\n",
    "print('**Samples in dataset**')\n",
    "array_shape = []\n",
    "print(\"Dictionary contains\", len(Xd_digital), \"keys, meaning\",len(Xd_digital),\"(mod, SNR) pairs\" )\n",
    "for k,v in Xd_digital.items():\n",
    "    array_shape.append(v.shape)\n",
    "print(\"Each key or (mod,SNR) pair is assigned\",set(array_shape), \"array\") \n",
    "print(\"This results in a total of\",array_shape[0][0], \"samples per (mod, SNR) pair, with each sample a\",\n",
    "      array_shape[0][1:3], \"array\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Separate training and test samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Separate samples for training and testing**\n",
      "Training:Test set ratio =  0.5\n",
      "Each key in training set is assigned {(500, 2, 128)} array\n",
      "Each key in test set is assigned {(500, 2, 128)} array\n",
      "This results in:\n",
      "      training set: a total of 500 samples per (mod, SNR) pair\n",
      "      test set: a total of 500 samples per (mod, SNR) pair\n"
     ]
    }
   ],
   "source": [
    "#Separate samples for training and test sets\n",
    "\n",
    "from sklearn.utils import shuffle\n",
    "\n",
    "print('**Separate samples for training and testing**')\n",
    "\n",
    "Xd_shuffled = dict()\n",
    "        \n",
    "#Randomize the samples assigned to each key \n",
    "# we take each key: (mod type, snr) pair, and shuffle the samples within this group of values\n",
    "for k,v in Xd_digital.items():\n",
    "        v = shuffle(v, random_state=0)\n",
    "        Xd_shuffled.update({k : v})\n",
    "# So Xd_shuffled is just Xd_digital with samples belonging to each key shuffled         \n",
    "#set train:test set ratio\n",
    "train_test = 0.5\n",
    "dict_test = dict()\n",
    "dict_train = dict()\n",
    "test_array_shape = []\n",
    "train_array_shape = []\n",
    "\n",
    "#extract the first 'train_test' fraction of samples to form test set and the rest to form training set\n",
    "for k,v in Xd_shuffled.items(): \n",
    "        dict_test.update({k : v[:int(v.shape[0]*0.5), :]}) \n",
    "        dict_train.update({k : v[int(v.shape[0]*0.5):, :]})\n",
    "# Take each (mod type, SNR) key. From these 1000 samples assigned to the key, take the first half and put them in test set\n",
    "# (we already shuffled these samples earlier)\n",
    "\n",
    "#check samples assgined to each key in dictionary dict_test and dict_train\n",
    "for k,v in dict_test.items(): test_array_shape.append(v.shape)        \n",
    "for k,v in dict_train.items(): train_array_shape.append(v.shape)\n",
    "\n",
    "print(\"Training:Test set ratio = \", train_test)\n",
    "print(\"Each key in training set is assigned\",set(train_array_shape), \"array\")\n",
    "print(\"Each key in test set is assigned\",set(test_array_shape), \"array\")\n",
    "print(\"This results in:\")\n",
    "print(\"      training set: a total of\",train_array_shape[0][0], \"samples per (mod, SNR) pair\")\n",
    "print(\"      test set: a total of\",test_array_shape[0][0], \"samples per (mod, SNR) pair\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From every (modulation, SNR) pair, randomly pick half the samples for training set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Training data (raw data)**\n",
      "(80000, 2, 128) training data,  (80000,) labels\n"
     ]
    }
   ],
   "source": [
    "snrs,mods = map(lambda j: sorted(list(set(map(lambda x: x[j], dict_train.keys())))), [1,0])\n",
    "X_train = []  \n",
    "labels_train = []\n",
    "for mod in mods:\n",
    "    for snr in snrs:\n",
    "        X_train.append(dict_train[(mod,snr)])\n",
    "        for i in range(dict_train[(mod,snr)].shape[0]): \n",
    "            labels_train.append((mod,snr))\n",
    "X_train = np.vstack(X_train)\n",
    "n_samples_train = X_train.shape[0]\n",
    "y_train = np.array(list(map(lambda x: mods.index(labels_train[x][0]), range(n_samples_train))))\n",
    "\n",
    "print(\"**Training data (raw data)**\")\n",
    "print(X_train.shape,\"training data, \", y_train.shape, \"labels\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## First 16 features based on 0 cyclic time lag\n",
    "Design features to transform raw data- each sample (2, 128) to feature vector of length 16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Form expert features \n",
    "\n",
    "def form_features(X):\n",
    "    \n",
    "    #Form array of complex numbers; convert each (2,128) sample to a (128,) sample\n",
    "    \n",
    "    n_samples = X.shape[0]\n",
    "    rows = X.shape[1]\n",
    "    vec_len = X.shape[2]\n",
    "    X_complex = []\n",
    "    X_complex = [complex(X[samp_num,0,column],X[samp_num,1,column]) \n",
    "                 for samp_num in range(n_samples) for column in range(vec_len)]\n",
    "    \n",
    "    X_complex = np.vstack(X_complex)\n",
    "    X_complex = np.reshape(X_complex, [n_samples,vec_len])\n",
    "    X_complex_sqr = np.square(X_complex)\n",
    "    X_complex_angl = np.angle(X_complex)\n",
    "    \n",
    "    #Form features: set of 16 expert features based on 0 cyclic time lag\n",
    "\n",
    "    dict_16features = {}\n",
    "    \n",
    "    #Feature 1\n",
    "    dict_16features['f1'] = np.array([np.mean(X_complex, axis=1)]).T\n",
    "\n",
    "    #Feature 2\n",
    "    dict_16features['f2'] = np.array([np.mean(abs(X_complex),axis = 1)]).T\n",
    "    \n",
    "    #Feature 3\n",
    "    dict_16features['f3'] = np.array([np.mean(X_complex_angl, axis = 1)]).T\n",
    "\n",
    "    #Feature 4\n",
    "    dict_16features['f4'] = np.array([np.mean(abs(X_complex_angl), axis = 1)]).T\n",
    "\n",
    "    #Feature 5\n",
    "    dict_16features['f5'] = np.array([np.mean(X_complex_sqr, axis = 1)]).T\n",
    "\n",
    "    #Feature 6\n",
    "    dict_16features['f6'] = np.array([np.mean(np.square(abs(X_complex)), axis = 1)]).T\n",
    "\n",
    "    #Feature 7\n",
    "    dict_16features['f7'] = np.array([np.mean(np.square(X_complex_angl), axis = 1)]).T\n",
    "\n",
    "    #Feature 8\n",
    "    dict_16features['f8'] = np.array([np.mean(np.square(abs(X_complex_angl)), axis = 1)]).T\n",
    "\n",
    "    #Feature 9\n",
    "    dict_16features['f9'] = np.array([(np.mean(X_complex_sqr,axis = 1)) - \n",
    "                   ((1/vec_len**2)*np.square(np.sum(X_complex, axis = 1)))]).T\n",
    "\n",
    "    #Feature 10\n",
    "    dict_16features['f10'] = np.array([(np.mean(np.square(abs(X_complex)), axis = 1)) - \n",
    "                    ((1/vec_len**2)*np.square(np.sum(abs(X_complex), axis = 1)))]).T\n",
    "\n",
    "    #Feature 11\n",
    "    dict_16features['f11'] = np.array([(np.mean(np.square(X_complex_angl),axis = 1)) - \n",
    "                    ((1/vec_len**2)*np.square(np.sum(X_complex_angl, axis = 1)))]).T\n",
    "\n",
    "    #Feature 12\n",
    "    dict_16features['f12'] = np.array([(np.mean(np.square(abs(X_complex_angl)), axis = 1)) - \n",
    "                    ((1/vec_len**2)* np.square(np.sum(abs(X_complex_angl), axis = 1)))]).T\n",
    "\n",
    "    #Feature 13\n",
    "    dict_16features['f13'] = np.array([(np.mean(np.power(X_complex,4), axis = 1)) - \n",
    "                    ((1/vec_len**2)*(np.sum(X_complex_sqr,1))**2)]).T\n",
    "\n",
    "    #Feature 14\n",
    "    dict_16features['f14'] = np.array([(np.mean(np.power(abs(X_complex),4), axis = 1)) - \n",
    "                   ((1/vec_len**2)*(np.sum(np.square(abs(X_complex)), axis = 1))**2)]).T\n",
    "\n",
    "    #Feature 15\n",
    "    dict_16features['f15'] = np.array([(np.mean(np.power(X_complex_angl,4), axis =1)) - \n",
    "                   ((1/vec_len**2)*(np.sum(np.square(X_complex_angl),axis = 1))**2)]).T\n",
    "\n",
    "    #Feature 16\n",
    "    dict_16features['f16'] = np.array([(np.mean(np.power(abs(X_complex_angl),4), axis =1) - \n",
    "                   ((1/vec_len**2)*(np.sum(np.square(abs(X_complex_angl)),axis = 1))**2))]).T \n",
    "\n",
    "    #Concatenate 16 feature arrays\n",
    "    X_16 = []\n",
    "    X_16 += dict_16features.values()\n",
    "    X_16 = abs(np.hstack(np.array(X_16)))\n",
    "    \n",
    "    return X_16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Training data after preprocessing**\n",
      "           - samples have 16 features\n",
      "(80000, 16) training data,  (80000,) labels\n"
     ]
    }
   ],
   "source": [
    "#Preprocess the training data\n",
    "\n",
    "#shuffle the samples\n",
    "X_train, y_train = shuffle(X_train, y_train)\n",
    "\n",
    "#form features \n",
    "X_16_train = form_features(X_train)\n",
    "\n",
    "#standardize the features\n",
    "sc = StandardScaler()\n",
    "sc.fit(X_16_train)\n",
    "X_train_std = sc.transform(X_16_train)\n",
    "\n",
    "print(\"**Training data after preprocessing**\")\n",
    "print(\"           - samples have\", X_train_std.shape[1], \"features\")\n",
    "print(X_train_std.shape,\"training data, \", y_train.shape, \"labels\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create features for the test data\n",
    "Use the same function for creating the test set. But unlike the training set, we keep track of the SNR for each sample in case of test set. This is because the performance/ accuracy of the trained ML algorithm depends on the SNR of the signal, and later an SNR vs. accuracy plot depicts how the algorithm does on samples corresponding to various SNRs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Test data**\n",
      "Separate arrays for samples corresponding to different SNRs\n",
      "Total 20 (4000, 16) arrays for SNR values [-20, -18, -16, -14, -12, -10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]\n"
     ]
    }
   ],
   "source": [
    "#Form and preprocess test data\n",
    "\n",
    "# new defaultdict with keys = SNR values and values = (2,128) samples  \n",
    "test_data = defaultdict(list)\n",
    "test_labels = defaultdict(list)\n",
    "\n",
    "# Extract all samples corresponding to each SNR value\n",
    "# dict_test had keys of the form (mod type, SNR); new dict test_data has keys (SNR)\n",
    "def form_test_data(snr):\n",
    "    for k,v in dict_test.items():\n",
    "        if k[1] == snr: \n",
    "            test_data[snr].append(v)\n",
    "            for x in range(v.shape[0]): \n",
    "                test_labels[snr].append(k[0]) \n",
    "    test_data[snr] = np.vstack(test_data[snr])\n",
    "    test_labels[snr] = np.vstack(test_labels[snr])\n",
    "    n_samples_test = test_data[snr].shape[0]\n",
    "    test_labels[snr] = np.array(list(map(lambda x: mods.index(test_labels[snr][x]), range(n_samples_test))))\n",
    "    return test_data[snr], test_labels[snr]\n",
    "    \n",
    "X_test = defaultdict(list)\n",
    "X_test16 = defaultdict(list)\n",
    "y_test = defaultdict(list)\n",
    "X_test_std = defaultdict(list)\n",
    "\n",
    "# Extract samples and labels for each SNR \n",
    "for snr in snrs:\n",
    "    data, labels = form_test_data(snr)     # extract all samples belonging to this SNR, from dict_test\n",
    "    X_test[snr].append(data)\n",
    "    X_test[snr] = np.vstack(X_test[snr])\n",
    "    y_test[snr].append(labels)             # extract the corresponding labels belonging to this SNR, from dict_test\n",
    "    y_test[snr] = np.hstack(y_test[snr])\n",
    "    X_test[snr], y_test[snr] = shuffle(X_test[snr], y_test[snr])  #shuffle the samples (2, 128)\n",
    "    X_test16[snr] = form_features(X_test[snr])   #form features; each sample is now (16,) feature vector \n",
    "    X_test_std[snr] = sc.transform(X_test16[snr])    #standardize the features\n",
    "    \n",
    "print(\"**Test data**\")\n",
    "print(\"Separate arrays for samples corresponding to different SNRs\")\n",
    "print(\"Total\", len(snrs), X_test_std[18].shape, \"arrays for SNR values\", snrs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Store variables in Jupyter's database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stored 'snrs' (list)\n",
      "Stored 'X_train_std' (ndarray)\n",
      "Stored 'X_test_std' (defaultdict)\n",
      "Stored 'y_train' (ndarray)\n",
      "Stored 'y_test' (defaultdict)\n"
     ]
    }
   ],
   "source": [
    "%store snrs\n",
    "%store X_train_std\n",
    "%store X_test_std\n",
    "%store y_train\n",
    "%store y_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next 16 features based on 8 cyclic time lag\n",
    "Design next 16 features to transform raw data- each sample (2, 128) to feature vector of length 16 and then conmbine the previous 16 and these 16 features to form feature set consisting of 32 features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Form expert features \n",
    "\n",
    "def form_next16_features(X):\n",
    "    \n",
    "    #Form array of complex numbers; convert each (2,128) sample to a (128,) sample\n",
    "    \n",
    "    n_samples = X.shape[0]\n",
    "    rows = X.shape[1]\n",
    "    vec_len = X.shape[2]\n",
    "    X_complex = []\n",
    "    X_complex = [complex(X[samp_num,0,column],X[samp_num,1,column]) \n",
    "                 for samp_num in range(n_samples) for column in range(vec_len)]\n",
    "    \n",
    "    X_complex = np.vstack(X_complex)\n",
    "    X_complex = np.reshape(X_complex, [n_samples,vec_len])\n",
    "    \n",
    "    #Form array to use in features based on 8 cyclic time lag\n",
    "\n",
    "    X_complex_8lag = np.zeros((X_complex.shape), dtype=complex)\n",
    "    for samp_num in range(n_samples):\n",
    "        for clmn in range(vec_len):\n",
    "            shift_by = (clmn + 8)%vec_len\n",
    "            X_complex_8lag[samp_num,clmn] = (X_complex[samp_num,clmn])*(X_complex[samp_num,shift_by])\n",
    "            \n",
    "    X_complex8_sqr = np.square(X_complex_8lag)\n",
    "    X_complex8_angl = np.angle(X_complex_8lag)\n",
    "    \n",
    "    #16 expert features based on 8 cyclic time lag\n",
    "    \n",
    "    dict_16features = {}\n",
    "    \n",
    "    #Feature 17\n",
    "    dict_16features['f17'] = np.array([np.mean(X_complex_8lag, axis = 1)]).T\n",
    "\n",
    "    #Feature 18\n",
    "    dict_16features['f18'] = np.array([np.mean(abs(X_complex_8lag),axis = 1)]).T\n",
    "\n",
    "    #Feature 19\n",
    "    dict_16features['f19'] = np.array([np.mean(X_complex8_angl, axis = 1)]).T\n",
    "\n",
    "    #Feature 20\n",
    "    dict_16features['f20'] = np.array([np.mean(abs(X_complex8_angl), axis = 1)]).T\n",
    "\n",
    "    #Feature 21\n",
    "    dict_16features['f21'] = np.array([np.mean(X_complex8_sqr, axis = 1)]).T\n",
    "\n",
    "    #Feature 22\n",
    "    dict_16features['f22'] = np.array([np.mean(np.square(abs(X_complex_8lag)), axis = 1)]).T\n",
    "\n",
    "    #Feature 23\n",
    "    dict_16features['f23'] = np.array([np.mean(np.square(X_complex8_angl), axis = 1)]).T\n",
    "\n",
    "    #Feature 24\n",
    "    dict_16features['f24'] = np.array([np.mean(np.square(abs(X_complex8_angl)), axis = 1)]).T\n",
    "\n",
    "    #Feature 25\n",
    "    dict_16features['f25'] = np.array([(np.mean(X_complex8_sqr,axis = 1)) - \n",
    "                    ((1/vec_len**2)*np.square(np.sum(X_complex_8lag, axis = 1)))]).T\n",
    "\n",
    "    #Feature 26\n",
    "    dict_16features['f26'] = np.array([(np.mean(np.square(abs(X_complex_8lag)), axis = 1)) -\n",
    "                    ((1/vec_len**2)*np.square(np.sum(abs(X_complex_8lag), axis = 1)))]).T\n",
    "\n",
    "    #Feature 27\n",
    "    dict_16features['f27'] = np.array([(np.mean(np.square(X_complex8_angl),axis = 1)) -\n",
    "              ((1/vec_len**2)*np.square(np.sum(X_complex8_angl, axis = 1)))]).T\n",
    "\n",
    "    #Feature 28\n",
    "    dict_16features['f28'] = np.array([(np.mean(np.square(abs(X_complex8_angl)), axis = 1)) - \n",
    "             ((1/vec_len**2)* np.square(np.sum(abs(X_complex8_angl), axis = 1)))]).T\n",
    "\n",
    "    #Feature 29\n",
    "    dict_16features['f29'] = np.array([(np.mean(np.power(X_complex_8lag,4), axis = 1)) - \n",
    "                    ((1/vec_len**2)*(np.sum(X_complex8_sqr,1))**2)]).T\n",
    "\n",
    "    #Feature 30\n",
    "    dict_16features['f30'] = np.array([(np.mean(np.power(abs(X_complex_8lag),4), axis = 1)) - \n",
    "             ((1/vec_len**2)*(np.sum(np.square(abs(X_complex_8lag)), axis = 1))**2)]).T\n",
    "\n",
    "    #Feature 31\n",
    "    dict_16features['f31'] = np.array([((1/vec_len)* np.sum(np.power(X_complex8_angl,4), axis =1)) - \n",
    "                     ((1/vec_len**2)*(np.sum(np.square(X_complex8_angl),axis = 1))**2)]).T\n",
    "\n",
    "    #Feature 32\n",
    "    dict_16features['f32'] = np.array([(np.mean(np.power(abs(X_complex8_angl),4), axis =1) - \n",
    "             ((1/vec_len**2)*(np.sum(np.square(abs(X_complex8_angl)),axis = 1))**2))]).T\n",
    "\n",
    "    #Concatenate 16 feature arrays\n",
    "    X_16 = []\n",
    "    X_16 += dict_16features.values()\n",
    "    X_16 = abs(np.hstack(np.array(X_16)))\n",
    "    \n",
    "    return X_16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Training data after preprocessing**\n",
      "           - samples have 32 features\n",
      "(80000, 32) training data,  (80000,) labels\n"
     ]
    }
   ],
   "source": [
    "#Preprocess the training data\n",
    "\n",
    "#form features \n",
    "X_next16_train = form_next16_features(X_train)\n",
    "\n",
    "# combine the previous 16 features with these next 16 features, to create (n_samples, 32) data\n",
    "X_32_train = np.concatenate((X_16_train, X_next16_train), axis = 1)\n",
    "\n",
    "#standardize the features\n",
    "sc = StandardScaler()\n",
    "sc.fit(X_32_train)\n",
    "X_32train_std = sc.transform(X_32_train)\n",
    "\n",
    "print(\"**Training data after preprocessing**\")\n",
    "print(\"           - samples have\", X_32train_std.shape[1], \"features\")\n",
    "print(X_32train_std.shape,\"training data, \", y_train.shape, \"labels\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "**Test data**\n",
      "Separate arrays for samples corresponding to different SNRs\n",
      "Total 20 (4000, 32) arrays for SNR values [-20, -18, -16, -14, -12, -10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]\n"
     ]
    }
   ],
   "source": [
    "#Form and preprocess test data\n",
    "    \n",
    "X_next16test = defaultdict(list)\n",
    "X_32test_std = defaultdict(list)\n",
    "X_32test = defaultdict(list)\n",
    "\n",
    "for snr in snrs:\n",
    "    X_next16test[snr] = form_next16_features(X_test[snr])   # form the next 16 features\n",
    "    # concatenate the previous 16 and these next 16 features, so total 32 features\n",
    "    X_32test[snr] = np.concatenate((X_test16[snr], X_next16test[snr]), axis = 1)   \n",
    "    X_32test_std[snr] = sc.transform(X_32test[snr])    #standardize the features\n",
    "    \n",
    "print(\"**Test data**\")\n",
    "print(\"Separate arrays for samples corresponding to different SNRs\")\n",
    "print(\"Total\", len(snrs), X_32test_std[18].shape, \"arrays for SNR values\", snrs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_32_train = y_train\n",
    "y_32_test = y_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Store variables in Jupyter's database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stored 'X_32train_std' (ndarray)\n",
      "Stored 'X_32test_std' (defaultdict)\n",
      "Stored 'y_32_train' (ndarray)\n",
      "Stored 'y_32_test' (defaultdict)\n"
     ]
    }
   ],
   "source": [
    "%store X_32train_std\n",
    "%store X_32test_std\n",
    "%store y_32_train\n",
    "%store y_32_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## View all stored variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Stored variables and their in-db values:\n",
      "X_32test_std              -> defaultdict(<class 'list'>, {-20: array([[-0.45951\n",
      "X_32train_std             -> array([[ 2.11096968, -1.63913221,  2.21713053, ...\n",
      "X_test_std                -> defaultdict(<class 'list'>, {-20: array([[-4.59517\n",
      "X_train_std               -> array([[ 2.11096968, -1.63913221,  2.21713053, ...\n",
      "snrs                      -> [-20, -18, -16, -14, -12, -10, -8, -6, -4, -2, 0, \n",
      "y_32_test                 -> defaultdict(<class 'list'>, {-20: array([1, 3, 5, \n",
      "y_32_train                -> array([4, 4, 6, ..., 1, 7, 0])\n",
      "y_test                    -> defaultdict(<class 'list'>, {-20: array([1, 3, 5, \n",
      "y_train                   -> array([4, 4, 6, ..., 1, 7, 0])\n"
     ]
    }
   ],
   "source": [
    "%store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
